42 research outputs found
MaxSR: Image Super-Resolution Using Improved MaxViT
While transformer models have been demonstrated to be effective for natural
language processing tasks and high-level vision tasks, only a few attempts have
been made to use powerful transformer models for single image super-resolution.
Because transformer models have powerful representation capacity and the
in-built self-attention mechanisms in transformer models help to leverage
self-similarity prior in input low-resolution image to improve performance for
single image super-resolution, we present a single image super-resolution model
based on recent hybrid vision transformer of MaxViT, named as MaxSR. MaxSR
consists of four parts, a shallow feature extraction block, multiple cascaded
adaptive MaxViT blocks to extract deep hierarchical features and model global
self-similarity from low-level features efficiently, a hierarchical feature
fusion block, and finally a reconstruction block. The key component of MaxSR,
i.e., adaptive MaxViT block, is based on MaxViT block which mixes MBConv with
squeeze-and-excitation, block attention and grid attention. In order to achieve
better global modelling of self-similarity in input low-resolution image, we
improve block attention and grid attention in MaxViT block to adaptive block
attention and adaptive grid attention which do self-attention inside each
window across all grids and each grid across all windows respectively in the
most efficient way. We instantiate proposed model for classical single image
super-resolution (MaxSR) and lightweight single image super-resolution
(MaxSR-light). Experiments show that our MaxSR and MaxSR-light establish new
state-of-the-art performance efficiently
Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning
Deep convolutional neural networks have been demonstrated to be effective for
SISR in recent years. On the one hand, residual connections and dense
connections have been used widely to ease forward information and backward
gradient flows to boost performance. However, current methods use residual
connections and dense connections separately in most network layers in a
sub-optimal way. On the other hand, although various networks and methods have
been designed to improve computation efficiency, save parameters, or utilize
training data of multiple scale factors for each other to boost performance, it
either do super-resolution in HR space to have a high computation cost or can
not share parameters between models of different scale factors to save
parameters and inference time. To tackle these challenges, we propose an
efficient single image super-resolution network using dual path connections
with multiple scale learning named as EMSRDPN. By introducing dual path
connections inspired by Dual Path Networks into EMSRDPN, it uses residual
connections and dense connections in an integrated way in most network layers.
Dual path connections have the benefits of both reusing common features of
residual connections and exploring new features of dense connections to learn a
good representation for SISR. To utilize the feature correlation of multiple
scale factors, EMSRDPN shares all network units in LR space between different
scale factors to learn shared features and only uses a separate reconstruction
unit for each scale factor, which can utilize training data of multiple scale
factors to help each other to boost performance, meanwhile which can save
parameters and support shared inference for multiple scale factors to improve
efficiency. Experiments show EMSRDPN achieves better performance and comparable
or even better parameter and inference efficiency over SOTA methods.Comment: 21 pages, 9 figures, 5 table